This is an interactive notebook. You can run it locally or use the links below:
Prerequisites
First, install the necessary libraries, set up API keys, log in to W&B, and create a new W&B project.- Install
weave
,pandas
,unsloth
,wandb
,litellm
,pydantic
,torch
, andfaiss-gpu
usingpip
.
- Add the necessary API keys from your environment.
- Log in to W&B, and create a new project.
Download ChatModel
from Models Registry and implement UnslothLoRAChatModel
In our scenario, the Llama-3.2 model has already been fine-tuned by the Model Team using the unsloth
library for performance optimization, and is available in the W&B Models Registry. In this step, we’ll retrieve the fine-tuned ChatModel
from the Registry and convert it into a weave.Model
to make it compatible with the RagModel
.
The
RagModel
referenced below is a top-level weave.Model
that can be considered a complete RAG Application. It contains a ChatModel
, vector database, and a prompt. The ChatModel
is also a weave.Model
, which contains code to download an artifact from the W&B Registry. ChatModel
can be changed modularly to support any kind of other LLM chat model as part of the RagModel
. For more information, view the model in Weave.ChatModel
, unsloth.FastLanguageModel
or peft.AutoPeftModelForCausalLM
with adapters are used, enabling efficient integration into the app. After downloading the model from the Registry, you can set up the initialization and prediction logic by using the model_post_init
method. The required code for this step is available in the Use tab of the Registry and can be copied directly into your implementation
The code below defines the UnslothLoRAChatModel
class to manage, initialize, and use the fine-tuned Llama-3.2 model retrieved from the W&B Models Registry. UnslothLoRAChatModel
uses unsloth.FastLanguageModel
for optimized inference. The model_post_init
method handles downloading and setting up the model, while the predict
method processes user queries and generates responses. To adapt the code for your use case, update the MODEL_REG_URL
with the correct Registry path for your fine-tuned model and adjust parameters like max_seq_length
or dtype
based on your hardware or requirements.
Integrate the new ChatModel
version into RagModel
Building a RAG application from a fine-tuned chat model improves conversational AI by using tailored components without having to rebuild the entire pipeline. In this step, we retrieve the existing RagModel
from our Weave project and update its ChatModel
to use the newly fine-tuned model. This seamless swap means that other components like the vector database (VDB) and prompts remain untouched, preserving the application’s overall structure while improving performance.
The code below retrieves the RagModel
object using a reference from the Weave project. The chat_model
attribute of the RagModel
is then updated to use the new UnslothLoRAChatModel
instance created in the previous step. After this, the updated RagModel
is published to create a new version. Finally, the updated RagModel
is used to run a sample prediction query, verifying that the new chat model is being used.
Run a weave.Evaluation
In the next step, we evaluate the performance of our updated RagModel
using an existing weave.Evaluation
. This process ensures that the new fine-tuned chat model is performing as expected within the RAG application. To streamline integration and enable collaboration between the Models and Apps teams, we log evaluation results for both the model’s W&B run and as part of the Weave workspace.
In Models:
- The evaluation summary is logged to the W&B run used to download the fine-tuned chat model. This includes summary metrics and graphs displayed in a workspace view for analysis.
- The evaluation trace ID is added to the run’s configuration, linking directly to the Weave page for easier traceability by the Model Team.
- The artifact or registry link for the
ChatModel
is stored as an input to theRagModel
. - The W&B run ID is saved as an extra column in the evaluation traces for better context.
RagModel
, and log the results to both W&B and Weave. Ensure that the evaluation reference (WEAVE_EVAL
) matches your project setup.
Save the new RAG Model to the Registry
To make the updatedRagModel
available for future use by both the Models and Apps teams, we push it to the W&B Models Registry as a reference artifact.
The code below retrieves the weave
object version and name for the updated RagModel
and uses them to create reference links. A new artifact is then created in W&B with metadata containing the model’s Weave URL. This artifact is logged to the W&B Registry and linked to a designated registry path.
Before running the code, ensure the ENTITY
and PROJECT
variables match your W&B setup, and the target registry path is correctly specified. This process finalizes the workflow by publishing the new RagModel
to the W&B ecosystem for easy collaboration and reuse.